Skip to content

feat: add network error detection and automatic retry logic#13

Open
gedeagas wants to merge 5 commits into
mainfrom
bekasi-uq
Open

feat: add network error detection and automatic retry logic#13
gedeagas wants to merge 5 commits into
mainfrom
bekasi-uq

Conversation

@gedeagas
Copy link
Copy Markdown
Owner

Summary

  • Detect network errors (ENOTFOUND, ECONNREFUSED, ETIMEDOUT, etc.) in the agent worker and automatically retry with exponential backoff (up to 3 attempts)
  • Show an offline banner at the top of the app when connectivity is lost, with a "Back online" transition animation on reconnect
  • Add a NetworkErrorPrompt in chat input for manual retry/dismiss when auto-retries are exhausted

Layers touched

  • Main process (src/main/) - services, IPC handlers
  • Preload (src/preload/) - context bridge API
  • Renderer (src/renderer/) - components, stores, lib
  • Styles (App.css)

Changes

Main process - Agent worker:

  • agentWorker/errorClassifier.ts: Added 'network' error kind with 16 regex patterns (ENOTFOUND, ECONNREFUSED, ETIMEDOUT, socket hang up, fetch failed, etc.). Network patterns are checked before auth patterns so connectivity issues to auth endpoints are treated as network errors.
  • agentWorker/core.ts: Added retryOnNetwork() method with per-session retry counter, exponential backoff (2s -> 4s -> 8s), and abortableSleep that respects session cancellation. Retry count resets on success, stop, or close. Emits a new retrying event so the renderer can show progress.

Renderer - Online status (lib/online.ts):

  • Added onOffline() callback registration (complement to existing onOnline())
  • Added useOnlineStatus() React hook via useSyncExternalStore for tear-free online/offline reads

Renderer - Components:

  • shared/OfflineBanner.tsx: Persistent amber warning bar with phase state machine (hidden -> offline -> reconnected -> dismissing). Shows checkmark + "Back online" for 2s before sliding away.
  • Center/NetworkErrorPrompt.tsx: Inline prompt in chat input area (reuses auth-error-prompt styles) with retry/dismiss buttons. Retry is disabled while offline.
  • Center/ChatInput.tsx: Integrates NetworkErrorPrompt, disables send button while offline
  • Center/ChatView.tsx: Renders OfflineBanner at top of chat area

Stores (sessions/):

  • storeTypes.ts: Added pendingNetworkError to session state, retryAfterNetworkError and dismissNetworkError actions
  • handlers/networkErrorActions.ts: New action creator - retry re-sends last user message, dismiss clears error state
  • eventHandler.ts: Handles new retrying event to update session activity text
  • handlers/handleWaiting.ts: Sets pendingNetworkError flag on network error kind

Styles:

  • offline-banner.css: Amber/green banner with slide-up dismiss animation

i18n:

  • Added keys for networkError, networkErrorMessage, networkRetry, offlineSendDisabled, offline.banner, offline.reconnected across en/ja/id locales

How to test

  1. yarn dev
  2. Auto-retry: Start a Claude session, then disconnect WiFi mid-response. Observe retry activity text ("Retrying 1/3...") in the session status. Reconnect WiFi - the session should resume automatically.
  3. Offline banner: Toggle WiFi off - an amber "You are offline" banner should appear below the header. Toggle WiFi on - banner should briefly show "Back online" with a checkmark, then slide away.
  4. Manual retry prompt: Disconnect WiFi and let all 3 auto-retries exhaust. A "Network error" prompt should appear in the chat input area. Reconnect WiFi and click "Retry" to re-send the last message.
  5. Send disabled while offline: With WiFi off, the send button in chat input should be disabled.

Checklist

  • Self-reviewed the diff
  • Tested locally with yarn dev
  • Types pass - yarn typecheck
  • No console errors or warnings in DevTools
  • IPC changes are threaded through all 3 layers (main -> preload -> renderer)
  • New state is added to the correct Zustand store

- Detect network errors (ENOTFOUND, ECONNREFUSED, ETIMEDOUT, etc.) separately from auth errors
- Implement exponential backoff retry strategy (up to 3 attempts) for network failures
- Add retrying event type to communicate retry attempts to UI
- Show offline banner and disable network operations when no connectivity
- Add useOnlineStatus hook for online/offline state management
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class network connectivity handling across the agent worker and UI: network errors are classified explicitly, the worker auto-retries with exponential backoff, and the renderer surfaces offline state + manual retry UX when retries are exhausted.

Changes:

  • Main: classify network errors and add automatic retry/backoff with a new retrying event.
  • Renderer: introduce an online-status store/hook, offline banner UI, and disable/annotate actions while offline.
  • Sessions store: track pendingNetworkError and add retry/dismiss actions + UI prompt.

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/renderer/types/session.ts Adds pendingNetworkError flag to session type.
src/renderer/styles/offline-banner.css New styling + animations for the offline banner.
src/renderer/styles/index.css Imports the new offline banner stylesheet.
src/renderer/styles/checks.css Adds stale styling for “last updated” when offline.
src/renderer/styles/chat-input.css Adds offline hint styling in chat input.
src/renderer/store/sessions/storeTypes.ts Adds network error retry/dismiss actions to the sessions store API.
src/renderer/store/sessions/store.ts Wires in the new network error actions creator.
src/renderer/store/sessions/handlers/networkErrorActions.ts Implements retry/dismiss behavior for exhausted network errors.
src/renderer/store/sessions/handlers/handleWaiting.ts Sets pendingNetworkError on errorKind === 'network'.
src/renderer/store/sessions/eventHandler.ts Handles new retrying event to update session activity text.
src/renderer/store/sessions/tests/draftActions.test.ts Updates test state builder to include new actions.
src/renderer/locales/ja/right.json Adds offline-disabled tooltip text (JA).
src/renderer/locales/ja/common.json Adds offline banner + cached data strings (JA).
src/renderer/locales/ja/center.json Adds offline hint + network error prompt strings (JA).
src/renderer/locales/id/right.json Adds offline-disabled tooltip text (ID).
src/renderer/locales/id/common.json Adds offline banner + cached data strings (ID).
src/renderer/locales/id/center.json Adds offline hint + network error prompt strings (ID).
src/renderer/locales/en/right.json Adds offline-disabled tooltip text (EN).
src/renderer/locales/en/common.json Adds offline banner + cached data strings (EN).
src/renderer/locales/en/center.json Adds offline hint + network error prompt strings (EN).
src/renderer/lib/online.ts Adds onOffline() + useOnlineStatus() via useSyncExternalStore.
src/renderer/components/shared/OfflineBanner.tsx Implements offline/reconnected banner state machine.
src/renderer/components/Right/PushBanner.tsx Disables push while offline and shows tooltip.
src/renderer/components/Right/ChecksView.tsx Shows cached/stale indicator while offline.
src/renderer/components/Right/ChecksSections.tsx Disables PR/pull/push/create actions while offline.
src/renderer/components/Right/ChangesView.tsx Disables pull while offline and adjusts tooltip content.
src/renderer/components/Center/NetworkErrorPrompt.tsx Adds manual retry/dismiss prompt when network retries are exhausted.
src/renderer/components/Center/ChatView.tsx Prevents sending while offline.
src/renderer/components/Center/ChatInput.tsx Integrates network error prompt + offline hint.
src/renderer/components/Center/ChatHeader.tsx Updates send tooltip text when offline.
src/renderer/App.tsx Mounts the global OfflineBanner.
src/main/services/agentWorker/errorClassifier.ts Adds network error kind + matching patterns.
src/main/services/agentWorker/core.ts Adds exponential backoff retry logic and emits retrying events.
src/main/services/agentTypes.ts Extends WorkerEvent (network errorKind + new retrying event).
src/main/services/tests/errorClassifier.test.ts Adds unit tests for network error classification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +50
// Clean up dismissing state after animation completes
useEffect(() => {
if (phase !== 'dismissing') return
const timer = setTimeout(() => dispatch({ type: 'dismiss' }), 300)
return () => clearTimeout(timer)
}, [phase])
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OfflineBanner never transitions back to hidden after the dismiss animation. The phase === 'dismissing' effect dispatches { type: 'dismiss' } again, which keeps the state stuck in dismissing (so the banner stays mounted/off-screen and may continue to be announced by assistive tech). Introduce a separate action/state (e.g. hide) to set phase to hidden after the slide-up completes, and dispatch that from the 300ms timeout instead of dismiss again.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added a hide action to the reducer that transitions from dismissing -> hidden. The 300ms timeout now dispatches { type: \"hide\" } instead of { type: \"dismiss\" }, so the banner is properly unmounted after the slide-up animation completes.

Comment on lines +5 to +19
.offline-banner {
position: fixed;
top: 0;
left: 0;
right: 0;
z-index: var(--z-sticky);
display: flex;
align-items: center;
justify-content: center;
gap: var(--space-8);
height: var(--space-32);
background: var(--amber-subtle, rgba(245, 158, 11, 0.12));
border-bottom: 1px solid var(--amber);
animation: offline-banner-slide-down var(--duration-fast) ease-out forwards;
-webkit-app-region: no-drag;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offline banner is position: fixed; top: 0 with a high z-index, which will overlay the app’s existing .drag-region titlebar area (52px tall) and can reduce/disable window dragging and top-level controls beneath it. Consider offsetting the banner below the drag region (e.g. top: 52px / a shared CSS variable) or rendering it inside the panels below the drag-region so it doesn’t cover the titlebar.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed top: 0 to top: 52px to sit below the drag-region titlebar, so window dragging and traffic light controls remain accessible.

Comment on lines +480 to +486
try {
await abortableSleep(delay, signal)
} catch {
// Aborted during sleep - user stopped the session
this.networkRetryCount.delete(sessionId)
return false
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user stops the session while waiting in the retry backoff sleep, abortableSleep rejects and retryOnNetwork returns false, but the caller continues and emits an error event. This will surface a network error in the UI even though the user explicitly cancelled. Consider returning a distinct result for abort vs. exhausted retries (or re-check signal.aborted after awaiting retryOnNetwork) and bail out like the normal aborted-path instead of emitting an error.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. After retryOnNetwork returns false, we now re-check abortController.signal.aborted and emit done (clean stop) instead of error. Applied in both startSession and sendMessage catch blocks.

Comment on lines +476 to +479
this.log(sessionId, `Network error, retrying ${attempt}/${MAX_NETWORK_RETRIES} in ${delay}ms: ${errorMsg}`)
this.networkRetryCount.set(sessionId, attempt)
this.emit({ type: 'retrying', sessionId, attempt, maxAttempts: MAX_NETWORK_RETRIES, delayMs: delay })

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worker emits a new { type: 'retrying' } event here, but src/main/services/agent.ts (AgentCoordinator) currently doesn’t handle/forward a retrying case in handleWorkerEvent, so the renderer will never receive these events and the UI activity update in src/renderer/store/sessions/eventHandler.ts won’t run. Add routing for this event through the coordinator (and any IPC boundary) so it reaches the renderer.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added case 'retrying' to handleWorkerEvent in AgentCoordinator (agent.ts) so the event is forwarded to the renderer via sendEvent.

Comment thread src/main/services/agentWorker/core.ts Outdated
Comment on lines +33 to +34
const timer = setTimeout(resolve, ms)
signal.addEventListener('abort', () => { clearTimeout(timer); reject(new Error('Aborted')) }, { once: true })
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abortableSleep attaches an abort listener to the AbortSignal but never removes it when the timeout completes normally. That can leave a dangling listener (and can also attempt to reject after the promise has already resolved if an abort happens later). Consider cleaning up the listener on resolve (e.g. keep a named handler and remove it in the timeout callback) to avoid leaks.

Suggested change
const timer = setTimeout(resolve, ms)
signal.addEventListener('abort', () => { clearTimeout(timer); reject(new Error('Aborted')) }, { once: true })
const onAbort = () => {
clearTimeout(timer)
reject(new Error('Aborted'))
}
const timer = setTimeout(() => {
signal.removeEventListener('abort', onAbort)
resolve()
}, ms)
signal.addEventListener('abort', onAbort, { once: true })

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Named the abort handler and added signal.removeEventListener(\"abort\", onAbort) inside the timeout callback so the listener is cleaned up on normal resolve.

Comment on lines +267 to +274
// Auto-retry on network errors with exponential backoff
if (errorKind === 'network') {
const retried = await this.retryOnNetwork(sessionId, message, abortController.signal)
if (retried) {
await this.startSession(sessionId, worktreeId, projectName, worktreePath, prompt, model, thinking, planMode, sessionName, settings, images, additionalDirectories, linkedWorktreeContext, connectedDeviceId, mobileFramework)
return
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network auto-retry is new behavior in AgentWorker (backoff timing, max attempts, and retrying emission), but there are no accompanying unit tests to verify: (1) a network-classified error triggers retries, (2) retrying events are emitted with the right attempt/max/delay, and (3) an error event is only emitted after retries are exhausted. Since src/main/services/__tests__/agentWorker.test.ts already covers other startSession/sendMessage behaviors, adding targeted tests for the retry path would help prevent regressions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Will add unit tests for the retry path as a follow-up.

…r positioning

- Clean up abort listeners in abortableSleep to prevent memory leaks
- Treat user abort during backoff as clean stop instead of error
- Add 'retrying' event handler to forward network retry attempts to UI
- Position offline banner below titlebar to prevent overlap
- Fix offline banner dismiss animation to properly transition to hidden state
# Conflicts:
#	src/renderer/locales/en/center.json
#	src/renderer/locales/en/right.json
#	src/renderer/locales/id/center.json
#	src/renderer/locales/id/right.json
#	src/renderer/locales/ja/center.json
#	src/renderer/locales/ja/right.json
# Conflicts:
#	src/main/services/agentWorker/core.ts
#	src/renderer/components/Center/ChatInput.tsx
#	src/renderer/store/sessions/store.ts
- Add afterEach to vitest imports for timer cleanup
- Add comprehensive tests for startSession retry logic with exponential backoff
- Add tests for sendMessage retry behavior during resume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants